Extraction and Oxford Nanopore sequencing of genomic DNA from filamentous Actinobacteria

Summary Actinomycetota (Actinobacteria) is an ecologically and industrially important phylum which is challenging to extract pure high-molecular-weight (HMW) DNA from. This protocol provides a parallelized, cost-effective, and straightforward approach for consistently extracting pure HMW DNA using modified non-toxic commercial kits suitable for higher throughput applications. We further provide a workflow for sequencing and assembly of complete genomes using an optimized Oxford Nanopore rapid barcoding protocol and Illumina data error correction.

In this step, the actinobacterial culture is grown in a liquid medium to generate sufficient biomass for DNA extraction. The culture should produce a minimum of 250 mg cell pellet.
1. In a laminar air flow (LAF) bench, inoculate 50 mL of liquid ISP2 in a 300 mL baffled flask with the strain of interest from either an agar plate or directly from a cryostock.
Note: This protocol has been used successfully with liquid ISP2, YEME, MS and ATCC-2.
2. Incubate the culture at 30 C and 140 rpm until the culture can yield two pellets of 250 mg each (typically 2-7 days but it may take longer).
Note: The growth rate will be highly dependent on the strain and conditions. If after 14 days no growth is observed consult troubleshooting, problem 1.

Discard contaminated cultures
Timing: 30 min + growth Contamination often appears in the early growth phase since contaminants (such as Escherichia coli and Bacillus spp.) typically grow at a faster rate than Actinobacteria. If contamination is present, one or more of the following signs may be observed in the culture after one day of inoculation: 3. Under an optical microscope, the typical filamentous morphology of Actinobacteria is not observed (see Figure 1). 4. High optical density, measured as OD 600 > 2 on a cell density meter, indicates the rapid growth of the culture within hours of inoculation. 5. Place the culture on the table and observe the sedimentation of the cells, Actinobacteria will sediment faster than E. coli. 6. Once sedimented, tilt the flask to one side and observe the migration of the particles. Actinobacteria move faster, whereas contaminants leave a trail behind them. 7. In a LAF bench, streak the culture on an ISP2 agar plate (with the same medium as the previous culture). a. After 1 or 2 days in an incubator at 30 C, observe the morphology of the colonies:

Lysis mix recipe
Note: If the colonies have a glossy appearance, the culture may be contaminated.
Alternatives: Use a toothpick to scrape the surface, if the colony texture is slimy and slippery it is likely contaminated since most of the Actinobacteria will start to crumble or be quite difficult to pull off the surface.

Cell harvesting
Timing: 2-4 h (28-56 samples) In this step, biomass is collected from liquid culture to prepare it for DNA extraction. In the end, this should result in 250 mg of harvested cell pellet per sample.
8. Inspect the growth of your liquid culture and note whether the cells tend to grow in an aggregated or turbid manner, Figure 2.

OPEN ACCESS
a. In an aggregated culture the cells form small lumps and are not suspended in the medium. Thus, the culture exhibits little to no turbidity instead growing in a flocculent or sedimented manner. This is a common growth pattern for Actinobacteria.
Note: Aggregation of the cells decreases the area available to enzymes during cell wall degradation, and as such, they must be ground to powder in liquid nitrogen (LN) as part of the extraction process.
b. In a turbid culture the cells grow suspended in the medium producing the opaque/cloudy cultures common for many lab-grown bacterial cultures. Since the cells are not heavily aggregated there is plenty of surface area available for enzymatic activity. 9. Transfer a 2 mL aliquot from the liquid culture into an Eppendorf tube.
Note: If the culture is aggregated, it is often easier to collect the mycelium using a P1000 pipette tip with the end cut off. This will allow the biomass to enter the pipette tip.
10. Centrifuge the sample at 13,000 3 g for 1 min and discard the supernatant. 11. Add 500 mL of phosphate-buffered saline (PBS, pH = 7.4) and resuspend the pellet by vortexing for 10-20 s. 12. Centrifuge the sample again at 13,000 3 g for 1 min and discard the supernatant. If there is little to no pellet (less than 150 mg), see troubleshooting, problem 1. 13. Store the harvested cells at À80 C at least 12 h or up to several months.

Cell lysis
Timing: 1-2 h hands-on + 4 h of incubation (28-56 samples) In this step, DNA is extracted by enzymatic lysis, while RNA and proteins are degraded by treatment with RNAse A and proteinase K. Samples that grew in an aggregated manner need to be ground to powder in LN before enzymatic treatment. As such, they take longer to process and grouping them together is advisable.
14. Prepare fresh lysis mix for the number of samples according to the lysis mix recipe in Buffers and solutions. 15. For each sample, dispense 3.5 mL lysis mix into a 50 mL Falcon tube. 16. Add the frozen cell pellet to the lysis mix, according to the culture's mode of growth: a. If the sample grew aggregated, grind it into a powder in LN (see Methods video S1: Physical disruption of cellular aggregates by grinding in liquid nitrogen cooled mortar). i. For each sample, submerge a clean mortar and pestle in LN in a LN compatible ice pan.
CRITICAL: Handle LN with appropriate safety gear (cryo-gloves, safety goggles, lab coat, closed-toe shoes, and long pants).
ii. Once cool, remove the mortar and pestle from the ice pan.
Note: Leaving a small amount of LN in the mortar will ensure the sample stays sufficiently cold.
iii. Place the still frozen pellet into the mortar and grind it into a fine powder.
CRITICAL: The pellet must remain at the temperature of the LN to minimize DNA shearing. iv. Scrape the powdered cell pellet into the Falcon tube with lysis mix. b. If the sample was turbid simply add the frozen bacterial pellet to the Falcon tube with lysis mix. 17. Homogenize the sample by vortexing for 10 s. 18. Incubate the sample at 37 C and 50 rpm for 2 h in a horizontal position.
Note: This extended incubation allows for complete lysis of the cell walls. The speed of the shaking should not exceed 50 rpm but can be lowered to minimize DNA shearing further.
19. Add 1.2 mL Buffer B2 to each sample. 20. Incubate the sample at 50 C and 50 rpm for 2 h.
Note: This extended incubation allows proteinase K to degrade cellular proteins completely.
Pause point: The now lysed samples can be stored at 4 C for $ 12 h.

DNA purification and precipitation
Timing: 4-6 h (28-56 samples) In this step, the extracted DNA is purified using QIAGEN Genomic-tip 20/G columns and afterwards precipitated via isopropanol to concentrate the DNA. 21. Place QF Buffer at 50 C until needed. 22. Vortex each sample for 10 s. 23. Centrifuge the samples at 8,000 3 g for 10 min.
Note: This pellets left-over cellular debris, which increases the purity of the extracted DNA and avoids clogging columns.
24. For each sample, place a QIAGEN Genomic-tip 20/G column into a 15 mL Falcon tube using the plastic spacer which comes with the kit. 25. Calibrate the 20/G columns with 1 mL of QBT Buffer. 26. Load 2-3 mL of the supernatant by pouring from the samples onto the columns. Take care to avoid the cell debris as it can clog the columns.
Note: If possible, avoid using a pipette to transfer the liquid to minimize shearing.
27. Wait for the samples to pass through the columns (up to 20 min). If the samples do not pass through, see troubleshooting, problem 2. 28. Add 2 mL QC Buffer to each column and let it pass through. 29. Add an additional 2 mL QC Buffer to each column and let it pass through. 30. Discard the flow-through and place the columns into new clean 15 mL Falcon tubes. 31. Elute the DNA with 2 mL preheated QF Buffer. 32. Discard the columns and save the Falcon tubes with eluted DNA.
Pause point: The eluted samples can be stored at 4 C for $ 12 h. 33. Prepare 2 mL of 70% (v/v) ethanol per sample by diluting absolute ethanol with nuclease-free water. Place the prepared ethanol on ice or in the freezer (À20 C). 34. Precipitate the DNA by adding 1.4 mL room temperature ($ 24 C) isopropanol. 35. Close the tubes and mix each sample gently by inverting the tube slowly until completely mixed (precipitated DNA may be observed as shown in Figure 3). CRITICAL: Avoid freezing HMW DNA after it has been purified to minimize DNA shearing.

Quality control
Timing: 5 h (for 28-56 samples) In this step the quality and the quantity of the purified DNA is assessed by its concentration, ratios of absorbance, and degree of fragmentation. The DNA quality control should not be performed before the DNA has had ample time to resuspend (a minimum of 16 h at 4 C).
45. Determine the DNA concentration of the sample by measuring 2 mL sample using the Qubit dsDNA broad range assay.
Note: The ideal concentration is approximately 50 ng/mL and at least 20 mL. For samples with concentrations below 10 ng/mL, see troubleshooting, problem 4. Note: If the samples fall outside acceptable ranges, that is a 260/280 ratio between 1.8-2.0 and 260/230 ratio between 1.9-2.2 Table 3 see troubleshooting, problem 6.
48. Store samples at 4 C after QC.

DNA library preparation
Timing: 2 h (12-16 samples) In this step, each sample's HMW DNA is sheared and has a unique barcode attached using the protocol for the SQK-RBK110.96 kit (Version: RBK_9126_v110_revF 24 Mar 2021) to generate 12-16 DNA libraries that are subsequently pooled into a multiplex sample for sequencing. Ideally, samples with concentrations of $ 50 ng/mL (G 10 ng/mL) should be used to promote equimolarity. If the samples do not meet these guidelines, it is advisable to dilute the samples at least one day before preparing the libraries, allowing HMW DNA to be suspended in the final volume at 4 C, and remeasure the DNA concentration as HMW DNA dilutions are highly inaccurate.
Note: The SQK-RBK110.96 protocol has been optimized for the present workflow by modifying the following elements: (1) The initial amount of HMW DNA per sample has been increased from 50 ng to 600 ng and is adjusted to a volume of 27 mL with 10 mM Tris-HCl pH 8.0 (instead of 9 mL and nuclease-free water).
(2) To maintain the volume proportion, 3 mL of Rapid Barcode is added to the diluted samples. Hence, the concentration of HMW DNA in relation to barcode has increased by a factor of 4 in comparison with the commercial protocol.
(3) For the SPRI clean-up step, the quantity of beads has been modified from 1 3 to 0.4 3 times the library volume, to remove small fragments and other impurities. In  Table 3.

OPEN ACCESS
addition, the release of HMW DNA during elution was increased by incubation for 2 min at 50 C.
49. Preheat the thermocycler to 30 C and the heating block to 50 C. 50. Handle kit components as shown in  Note: If concentration is above 100 ng/mL, dilute with EB buffer for a final concentration of 50 ng/mL in a volume of 13 mL. Measure the concentration again using 2 mL of sample in Qubit.
69. Add 1 mL of RAP F to the remaining 11 mL of barcoded DNA. 70. Mix gently by flicking the tube, and spin down. 71. Incubate the reaction for 5 min at room temperature ($ 24 C). 72. Store the pooled library on ice until ready to load on the flow cell. Starting the flow cell sequencing run Note: After the starting the sequencing run, it is recommendable to monitor the initial result after ca 1 h to ensure that the majority of pores are sequencing samples, and to verify that the fragment length N50 is > 10 kb.

Priming and loading the SpotON flow cell
Note: There might be an elevated amount of pores sequencing adapters and short fragments shortly after starting a run: these will generally be depleted, and pores will sequence an increasingly high fraction of sample DNA until a maximum is reached after several hours.
86. Let the data generation continue $ 16 h and the next day check the amount of data generated per barcode. a. In total 8-12 Gbases should be generated, or 500-800 Mbases per sample.
Note: It is very hard to achieve even barcode distribution, so generating excess data will ensure that libraries with less data will assemble. Between 10% and 20% of data will not have a barcode assigned.
87. Continue to initial assembly of data, freeing up the flow cell to sequence new samples.

Initial de novo assembly of data
Timing: 4 h (for step 88)

OPEN ACCESS
For complete de novo assembly of an Actinobacteria genome using Nanopore sequencing, a coverage > 50 3, and read N50 > 10 kb is desirable, though some genomes can assemble from less data or shorter reads.
88. Open a terminal and navigate to the run folder. Then in each barcode folder, run: 89. Check stats inside the generated folder, do: 90. Inspect the coverage, circularity, and edges on each generated contig. A complete contig is either circular or is denoted with a '*' before and after the repeat-graph path of each contig (see Figure 6): 91. Visualize the assembly repeat-graph using Bandage: Note: If the assembly is incomplete, see troubleshooting, problem 7.

Illumina data generation and hybrid assembly
Timing: 4 h for hybrid genome assembly (for step 94) To achieve a high quality of assembly, it is necessary to supplement the Oxford Nanopore 9.4.1 data with Illumina data, and then use this data to polish the complete genome sequence generated with the long-read data.
92. Generate or order Illumina library and sequencing at a service provider.
Note: As Illumina input DNA requirements are much more flexible for Illumina data than for nanopore data, commercial service providers can easily produce the needed amount and quality of data.
a. Use a protocol without transposase shearing, and with a low number of PCR cycles, such as the NEB Next Ultra DNA Library Prep Kit with only 6 PCR cycles. b. Sequence with 2 3 150 nt paired-end chemistry and coverage of $ 50 for each genome on a NovaSeq Illumina machine.
Note: These suggestions are based on the high GC content of Actinobacteria, which if not followed will lead to genomic areas not covered by the Illumina data and thus left unpolished.
93. Generate the hybrid assembly: a. Use the Flye assembler 3 with a total of five rounds of polishing with the long-read data. b. Polish the assembly, first with Polypolish 5 and subsequently POLCA, 6  Note: Using Polypolish is particularly important for genomes like Streptomyces, where a large, inverted repeat carrying biosynthetic gene clusters will otherwise have approximately 1 insertion or deletion error per 1,000 nt sequence, which severely limits analysis.
Note: Generation of Illumina data in collaboration with a service provider is generally expected to take several weeks. Note: the fragmented and incomplete BUSCO genes should be <5% combined, and the number of duplicated BUSCO genes should be < 20. Some duplicated BUSCO genes are expected and likely a result of self-resistance. 11

EXPECTED OUTCOMES HMW DNA extraction
The main goal after the lysis and purification protocol is to obtain HMW DNA of sufficient concentration and quality for the sequencing of the complete bacterial genome. Following precipitation of HMW DNA in most cases the molecules can be visually confirmed (see Figure 3), providing an indication of the success of the extraction and high concentration, although not seeing DNA is not an indication of failed extraction.
Due to the heterogeneity between samples, the concentrations will also differ. Generally, sample concentrations range from 20-300 ng/mL. Table 3 shows the quality of 14 samples processed according to the protocol. In terms of purity, a sample should have an A260/A280 ratio close to $ 1.8 and an A260/A230 ratio above 2.0. We accept 260/280 ratios between 1.8-2.0 and 260/230 ratios between 1.9-2.2. If the sample falls outside this range, see troubleshooting, problem 6. An example is provided in Table 3.
Samples that pass quality control should have HMW DNA fragments of approximately 50 kb in length (as reflected in the pulsed-field gel electrophoresis picture in Figure 5).

Assembly result
Obtaining a complete assembled genome of high quality is the expected last result. This should be a single contig for the bacterial chromosome, additionally, there may be other contigs for plasmids, one contig per replicon. As previously mentioned, Streptomyces usually have linear genomes with inverted repeat and telomeric structures ( Figure 6).
Generally, it should be possible to assemble complete genomes from samples with a coverage of 50 3 (this can be found at the end of the log file produced by Flye and in the repeat-graph file produced by Flye) and with an N50 of > 10 kb (Table 4, Figure 6).
There are two genomic topologies in Actinobacteria, linear and circular. In the assembly repeat graph, these topologies are expected to be represented in one of the following ways: Circular chromosome: Flye generates a repeat-graph path of a single contig from the chromosome with a single edge, marked as circular (see Sample A, Figure 6).
Linear chromosomes without inverted repeats at the ends: Flye describes the repeat-graph path in a single contig of a single edge with a (*) symbol that denotes a terminal graph node. An example of this type can be seen in Sample D, Figure 6, which also has a linear plasmid placed in a second contig, note that it is multiplied three times, indicating a copy number of 3.
Note: The telomeric chromosome ends can be present even if the repeat graph does not show them if they are resolved during the assembly, for example if the read length greatly exceeds the IR and telomeric repeats.
Linear chromosomes with inverted repeat and telomeric structures at the end: Flye, often, places the inverted repeat in the repeat-graph on a separate contig instead of in both ends of the chromosome (see Sample E, Figure 6, troubleshooting, problem 7). Ratios outside of the desired range are indicated with an asterisk (*). The fragmentation of the same samples is depicted in Figure 4 ll

OPEN ACCESS
Note: Streptomyces usually have linear chromosomes with inverted repeat and telomeric structures at the end of the chromosome.

LIMITATIONS
This protocol should work for all cultured Actinobacteria. However, occasionally, it is not possible to obtain a complete genome sequence (one replicon = 1 contig) for a strain, for example, if repeats much greater than the read length are present. The entire genome sequence will be there, just fragmented, which can limit subsequent analysis.
A limiting factor for the throughput of this protocol is the number of mortars and pestles available: a clean mortar and pestle is needed per sample.
For Streptomyces strains, the genome is often linear with inverted repeat chromosome ends. This structure will often not be resolved in the assembly repeat graph and is often not fully assembled or wrongly assembled in the final Flye output. An assembly with repeats longer than the read length will generally be incomplete.

OPEN ACCESS
This protocol does not adapt to the Short Real Eliminator protocol, and in such, does not cover troubleshooting connected to it.
It is important to note, that while long-read Oxford Nanopore reads will enable a complete genome assembly, it currently does have a higher error-rate than short-read sequencing. Therefore, supplementing with short reads for polishing the assembly will provide both a complete and highly accurate assembly.

Problem 1
Not enough material in the culture.
Insufficient biomass (<150-250 mg) in the initial pellet can lead to little or no DNA being extracted in the end.

Potential solution
Increase the culture volume used to pellet biomass.  Table 4. If the strain exhibits little to no growth at all, change the media/conditions used to grow the strain. Conditions, which have worked previously, for instance when the strain was isolated (changed temperature, reduced rpm or different media) are ideal options for this. Alternatively, consult Practical Streptomyces Genetics 9 for guidance.
During the extraction process, if after loading the sample or any other reagent the column becomes clogged, and the sample is not passing through.

Potential solution
Check the viscosity of the sample. If it is high, recover the volume loaded on the column and dilute the sample with B3 buffer. The clogging can be overcome by adding slight positive and constant pressure with air from a syringe (10 mL). It is important to exert as little pressure as necessary to avoid shearing of DNA as much as possible. Stop exerting pressure and remove the syringe from the column before the entire volume has passed through the column. Introducing air to the filter must be avoided. If there was a need to applying pressure into the column, this might be needed to be repeated in the following steps as well.
Caution: Putting additional pressure on the column puts physical stress on the DNA, which can result in further fragmentation of the DNA.

Problem 3
DNA Pellet does not dissolve.
If the DNA pellet was excessively dried, it will not easily dissolve in the TA Buffer.

Potential solution
If time is ample; incubate the sample at 4 C for several days or weeks. It takes time for HMW DNA to dissolve and homogenize, especially after over-drying. If time is important: Incubate the sample at 50 C and gently flick it occasionally. If the DNA pellet was overdried in any step, it may be impossible to sufficiently resuspend. In this case, repeat the entire extraction.

Problem 4
Low DNA concentration.
A low final concentration (<10 ng/mL) indicates that an insufficient amount of biomass was used at the start of the extraction or that DNA was lost during the purification step (usually during the precipitation).

Potential solution
Repeat the extraction of the sample. If the sample was not ground in LN the first time, do that now. Sequence it anyway: If the sample otherwise passed quality control it can be sequenced, but ideally together with other samples of similar concentration, as this will otherwise skew barcode distribution. A decrease in throughput should be expected and can be compensated for by adjusting the pooling of samples. Perform a 1 3 volume SPRI clean-up and elute in a reduced volume.

Problem 5
Smears on gel.
A smear on the agarose gel indicates fragmentation of the sample or overloading of the agarose gel.

Potential solution
If the DNA concentration is higher than 200 ng/mL, dilute the sample in question $ 50 ng/mL and incubate for at least 12 h at 4 C. Repeat the agarose gel electrophoresis. Perform an SRE clean-up to remove fragments below 10 kb. Note that 25%-75% of DNA is lost during the SRE protocol.
A low A260/A280 can indicate contamination by proteins, while a low A260/A230 can indicate salt contamination.

Potential solution
Repeating the extraction process, taking care of the sample in question. If the sample was not ground in LN the first time, do that now. Exchanging the buffer by performing a 1.0 3 SPRI clean-up.

Problem 7
Incomplete genome assembly.
Occasionally, a genome will not fully assemble even if the read length and coverage are good (N50 > 10 kb, cov >50 3).

Potential solution
Use Filtlong (https://github.com/rrwick/Filtlong) to remove the shortest and lowest quality reads. We suggest removing 30%, 50%, 75% or even 90% of data, aiming to remove the largest amount of data while still having a coverage of 50. When all replicons are assembled into contigs where any assembly repeat graph branching can be explained by biology (i.e., the inverted repeats of Streptomyces, see Figure 5, sample E), it is indicative of a completely assembled genome, as long as the coverage is still substantial. In some cases, a coverage <50 can yield a complete genome. Use a different genome assembly program. Even though Flye performs consistently well, using another assembly program might help alleviate an assembly issue. We suggest using one of the assemblers benchmarked in Wick and Holt. 12 Generate more data. After stopping a sequencing run, the flow cell can be washed and reloaded with sequencing libraries from samples, which did not fully assemble. This will yield a higher coverage, which might be necessary to completely assemble a genome. For Streptomyces strains, which generally have linear genomes, often with long terminal inverted repeats, the chromosome is often split into two contigs in the final Flye assembly, or the inverted repeat is incorrectly placed on its own contig in addition to in both ends of the chromosome. If this problem is not resolved by using a different assembly program, we suggest to manually concatenate the repeat graph edges and verify their orientation by mapping the Nanopore reads and ll OPEN ACCESS